Retrofitting Word Vectors to Semantic Lexicons

2017-07-05

Introduction

This paper proposes a method for refining vector space representations using relational information from semantic lexicons by encouraging linked words to have similar vector representations.

It makes no assumptions about how the input vectors were constructed.

The contribution of this paper is a graph-based learning technique for using lexical relational resources to obtain higher quality semantic vectors, which called retrofitting.

It is a post-processing step by running belief propagation on a graph constructed from lexicon-derived relational information to update word vectors.

The new vectors to be:

similar to the vectors of related word types
similar to their purely distributional representation

They show that retrofitting gives consistent improvement in performance on evaluation benchmarks with different word vectors lengths.

Retrofitting with Semantic Lexicons

$\hat{Q}$ is the original word vectors. The object is to learn the matrix $Q=(q_1,...,q_n)$ such that the columns are both close (under a distance metric) to their counterparts in $\hat{Q}$ and to adjacent vertices in $\omega$.

$\Psi(Q)=\sum_{i=1}^{n}[\alpha_i||q_i-\hat{q_i}||^2 + \sum_{(i,j)\in E} \beta_{ij} ||q_i-q_j||^2]$

Take the first derivative of $\Psi$ with respect to one $q_i$ vector, and by equating it to zero arrive at the following online update:

$q_i=\frac{\sum_{j:(i,j) \in E} \beta_{ij}q_j+\alpha_i \hat{q_i}}{\sum_{j:(i,j) \in E} \beta_{ij}+\alpha_i}$

Semantic Lexicons during Learning

The approach is applied as a second stage of learning. In the prior approach, semantic lexicons play the role of a prior on $Q$ which is defined as follows:

$p(Q) \propto exp(-\gamma \sum_{i=1}^n \sum_{j:(i,j)\in E} \beta_{ij}||q_i-q_j||^2)$

Evaluation Benchmarks

Word Similarity

WS-353
RG-65
MEN
They calculate cosine similarity between the vectors of two words forming a test item, and report Spearman's rank correlation coefficient between the rankings produced by our model against the human rankings

Syntactic Relations (SYN-REL)

The task is to find a word $d$ that best fits the following relationship: “$a$ is to $b$ as $c$ is to $d$”, given $a$, $b$, and $c$.

Synonym Selection (TOEFL)

Sentiment Analysis (SA)

Experiments

using retrofitting method, PPDB is best

compared LBL, LBL+lazy, LBL+periodic and LBL+retrofitting, LBL+lazy and LBL+periodic are incorporating lexicon prior during training.

lazy方法指的是一次性算出k个词的log-likelihood和log-prior的梯度的和。

periodic的方法则是每隔一段时间，利用公式1来更新所有的词向量

LBL的目标公式和word2vec的差别在于后者使用了softmax。

analysis

在学习的过程中修改的话，periodic方法更好。而本文提出的方法则是表现要好于lazy和periodic的。

Blog

Papers